Using differential item functioning to evaluate potential bias in a high stakes postgraduate knowledge based assessment
نویسندگان
چکیده
BACKGROUND Fairness is a critical component of defensible assessment. Candidates should perform according to ability without influence from background characteristics such as ethnicity or sex. However, performance differs by candidate background in many assessment environments. Many potential causes of such differences exist, and examinations must be routinely analysed to ensure they do not present inappropriate progression barriers for any candidate group. By analysing the individual questions of an examination through techniques such as Differential Item Functioning (DIF), we can test whether a subset of unfair questions explains group-level differences. Such items can then be revised or removed. METHODS We used DIF to investigate fairness for 13,694 candidates sitting a major international summative postgraduate examination in internal medicine. We compared (a) ethnically white UK graduates against ethnically non-white UK graduates and (b) male UK graduates against female UK graduates. DIF was used to test 2773 questions across 14 sittings. RESULTS Across 2773 questions eight (0.29%) showed notable DIF after correcting for multiple comparisons: seven medium effects and one large effect. Blinded analysis of these questions by a panel of clinician assessors identified no plausible explanations for the differences. These questions were removed from the question bank and we present them here to share knowledge of questions with DIF. These questions did not significantly impact the overall performance of the cohort. Group-level differences in performance between the groups we studied in this examination cannot be explained by a subset of unfair questions. CONCLUSIONS DIF helps explore fairness in assessment at the question level. This is especially important in high-stakes assessment where a small number of unfair questions may adversely impact the passing rates of some groups. However, very few questions exhibited notable DIF so differences in passing rates for the groups we studied cannot be explained by unfairness at the question level.
منابع مشابه
Differential Item Functioning (DIF) in Terms of Gender in the Reading Comprehension Subtest of a High-Stakes Test
Validation is an important enterprise especially when a test is a high stakes one. Demographic variables like gender and field of study can affect test results and interpretations. Differential Item Functioning (DIF) is a way to make sure that a test does not favor one group of test takers over the others. This study investigated DIF in terms of gender in the reading comprehension subtest (35 i...
متن کاملSelecting the Best Fit Model in Cognitive Diagnostic Assessment: Differential Item Functioning Detection in the Reading Comprehension of the PhD Nationwide Admission Test
This study was an attemptto provide detailed information of the strengths and weaknesses of test takers‟ real ability through cognitive diagnostic assessment, and to detect differential item functioning in each test item. The rationale for using CDA was that it estimates an item‟s discrimination power, whereas clas- sical test theory or item response theory depicts between rather within item mu...
متن کاملInterpreting the Validity of a High-Stakes Test in Light of the Argument-Based Framework: Implications for Test Improvement
The validity of large-scale assessments may be compromised, partly due to their content inappropriateness or construct underrepresentation. Few validity studies have focused on such assessments within an argument-based framework. This study analyzed the domain description and evaluation inference of the Ph.D. Entrance Exam of ELT (PEEE) sat by Ph.D. examinees (n = 999) in 2014 in Iran....
متن کاملIssues Affecting Item Response Theory Fit in Language Assessment: A Study of Differential Item Functioning in the Iranian National University Entrance Exam
This study aimed at examining the issues affecting the use of IRT models in investigating differential item functioning in high stakes testing. It specifically focused on the Iranian National University Entrance Exam (INUEE) Special English Subtest. A sample of 200,000 participants was randomly selected from the candidates taking part in the INUEE 2003 and 2004 respectively. The data collected ...
متن کاملGender-based DIF across the Subject Area: A Study of the Iranian National University Entrance Exam
This study aimed at investigating differential item functioning (DIF) on the Special English Test of the Iranian National University Entrance Exam (INUEE). The effect of gender and subject area was taken into account. The study utilized one-parameter IRT model with a sample of 36000 students who sat for the INUEE Special English Test in 2004 and/or 2005. The findings confirmed the presence of D...
متن کامل